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NASA Task Load Index (NASA-TLX) 

Version 1.0 

Paper and Pencil Package 

This booklet contains the materials necessary to collect subjective 
workload assessments with the NASA Task Load Index. This procedure for 
collecting workload ratings was developed by the Human Performance Group 
at NASA Ames Research Center during a three year research effort that 
involved more than 40 laboratory, simulation, and inflight experiments 
Although the technique is still undergoing evaluation, this booklet is being 
distributed to allow other researchers to use it in their own experiments 
Comments or suggestions about the procedure would be greatly appreciated 
This package is intended to fill a ’Viuts and bolts" function of describing the 
procedure. A bibliography provides background information about previous 
empirical findings and the logic that supports the procedure. 

1. BACKGROUND 

The NASA Task Load Index is a multi dimensional rating procedure 
that provides an overall workload score based on a weighted average of rat- 
ings on six subscales: Mental Demands. Physical Demands. Temporal 

Demands. Own Performance. Effort, and Frustration. A definition of each 
subscale is provided in Appendix A. 

An earlier version of the scale had nine subscales. It was designed to 
reduce between-rater variability by using the a priori workload definitions of 
.subjects to weight and average subscale ratings. This technique (referred to 
as the "NASA Bipolar Rating Scale") was quite successful in reducing 
between-rater variability, and it provided diagnostic information about the 
magnitudes of different sources of load from subscale ratings (Hart. Bat- 
tiste. it Lester, 1984; Vidulich it Tsang. 1985a it b). However, its sensi 
tivity to experimental manipulations, while better than found for other popu- 
lar techniques and for a global unidimensional workload rating, was still not 
considered sufficient. In addition, it was felt that nine subscales are too 
many, making the £cale impractical to use in a simulation or operational 
environment Finally, several of the subscales were found to be irrelevant to 
workload (e g , Fatigue) or redundant (eg., Stress and Frustration) For 
these reasons, the NASA Task Load Index was developed. Some of the 
subscales from the original scale were revised or combined, others deleted. 
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and two added Three dimensions relate to the demands imposed on the 
subject (Mental. Physical, and Temporal Demands) and three to the interac- 
tion of a subject with the task (Effort. Frustration, and Performance) 

Although it is clear that definitions of workload do indeed vary among 
experimenters and among subjects (contributing to confusion in the work- 
load literature and between rater variability), it was found that the specific 
sources of loading imposed by different tasks are an even more important 
determinant of workload experiences Thus, the current version of the scale 
(the Task Load Index) combines subscale ratings that are weighted accord- 
ing to their subjective importance to raters in a specific task, rather than 
their a priori relevance to raters' definitions of workload in general. 

2. DESCRIPTION 

2.1. General Information 

The degree to which each of the six factors contribute to the workload 
of the specific task to be evaluated, from the raters' perspectives, is deter- 
mined by their responses to pair-wise comparisons among the six factors. 
Magnitude ratings on each subscale are obtained after each performance of a 
task or task segment. Ratings of factors deemed most important in creating 
the workload of a task are given more weight in computing the overall work- 
load score, thereby enhancing the sensitivity of the scale. 

The weights and ratings may or may not covary. For example, it is 
possible for mental demands to be the primary source of loading for a task, 
even though the magnitude of the mental demands might be low. Con- 
versely. the time pressure under which a task is performed might be the pri- 
mary source of its workload, and the time demands might be rated as being 
high for some versions of the task and low for others. 

Since subjects can give ratings quickly, it may be possible to obtain 
them in operational settings However, a videotaped replay or computer 
regeneration of the operator's activities may be presented as a mnemonic aid 
that can be stopped after each segment to obtain ratings retospectively. It 
was shown in a helicopter simulation and in a supervisory Control simulation 
(Hart, Battiste. Chesney. Ward. & McElroy, 1986; Haworth, Bivens, and 
Shively, 1986) that little information was lost when ratings were given 
retrospectively; a high correlation was found between ratings that were 
obtained ’bnline" and those that were obtained retrospectively with a visual 
re-creation of the task 
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The Task Load Index has been tested in a variety of experimental 
tasks that range from simulated flight to supervisory control simulations and 
laboratory tasks (e g., the Sternberg memory task, choice reaction time, 
critical instability tracking, compensatory tracking, mental arithmetic, men- 
tal rotation, target acquisition, grammatical reasoning, etc.) The results of 
the first validation study are summarized in Hart & Staveland (in press). 
The derived workload scores have been found to have substantially less 
between-rater variability than unidimensional workload ratings, and the sub- 
scales provide diagnostic information about the sources of load. 

2.2. Sources of Load (Weights) 

The NASA Task Load Index is a two-part evaluation procedure con- 
sisting of both weights and ratings. The first requirement is for each rater 
to evaluate the contribution of each factor (its weight) to the workload of a 
specific task These weights account for two potential sources of between- 
rater variability: differences in workload definition between raters within a 
task, and differences in the sources of workload between tasks. In addition, 
the weights themselves provide diagnostic information about the nature of 
the workload imposed by the task. 

There are 15 possible pair-wise comparisons of the six scales 
(Apppendix B) Each pair is presented on a card. Subjects circle the 

member of each pair that contributed more to the workload of that task. 
The number of times that each factor is selected is tallied. The tallies can 
range from 0 (not relevant) to 5 (more important than any other factor). 

A different set of weights is obtained for each distinctly different task 
or task element upon its completion. The same set of weights can be used 
for many different versions of the same task if the contributions of the six 
factors to their workload is fairly similar. For example, the same set of 
weights was used for many different versions of a target acquisition task in 
which time pressure, target acquisition difficulty, and decision making load 
were varied Obtaining separate weights for different experimental manipu- 
lations increased the sensitivity of the derived workload score only slightly, 
and did not warrant the additional time required to gather them. On the 
other hand, the weights obtained from the same subjects for a compensatory 
tracking task or a memory search task would not have been appropriate for 
the target acquisition task 
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2.3. Magnitude of Load (Ratings) 

The second requirement is to obtain numerical ratings for each scale 
that reflect the magnitude of that factor in a given task. The scales are 
presented on a rating sheet (Appendix C) Subjects respond by marking 
each scale at the desired location In operational situations, rating sheets or 
verbal responses are more practical, while a computerized version (available 
from NASA Ames Research Center) is more efficient for most simulation 
and laboratory settings Ratings may be obtained either during a task, after 
task segments or following an entire task Each scale is presented as a 
12 cm line divided into 20 equal intervals anchored by bipolar descriptors 
(e g.. High/Low) The 21 vertical tick marks on each scale divide the scale 
from 0 to 100 in increments of 5. If a subject marks between two ticks, the 
value of the right tick is used (i.e., round up). 

2.4. Weighting and Averaging Procedure 

The overall workload score for each subject is computed by multiply- 
ing each rating by the weight given to that factor by that subject The sum 
of the weighted ratings for each task is divided by 15 (the sum of the 
weights). (See Appendix D and E for a sample Tally Sheet and Worksheet.) 

3. EXPERIMENTAL PROCEDURE 

The usual sequence of events for collecting data with the NASA Task 
Load Index is as follows: 

3.1. Instructions 

Subjects read the scale definitions and instructions. A set of 
generic instructions is included in Section 6. Some modifica- 
tions may be necessary depending on your situation. 

3.2. Familiarization 

Subjects practice using the rating scales after performing a few 
tasks to insure that they have developed a standard technique 
for dealing with the scales. 

3.3. Ratings 

Subjects perform the experimental tasks, providing ratings on 
the six subscales following all task conditions of interest. The 
number of rating sheets needed equals the number of subjects X 
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the number of task conditions (including practice). 

3.4. Weights 

Subjects complete the 'Sources-of-Workload Evaluation" once 
for each task or group of tasks included in the experiment that 
share a common structure (although difficulty levels may vary). 
For example, in an experiment with several memory tasks and 
several tracking tasks, two Sources-of-Workload Evaluations 
would be performed: one for the memory tasks and one for the 
tracking tasks One set of cards should be made in advance of 
the experiment for each subject X evaluation condition combina- 
tion. The pairs of factors should be cut apart and presented 
individually in a different, randomly selected, order to each sub- 
ject. Subject instructions for doing the Sources of Workload 
Evaluation are in Section 7. (Note that the exact time when the 
weights are obtained is not critical. However, in order for them 
to provide useful information, they must be obtained after at 
least some exposure to the relevant task conditions.) 

3.5. Summary 

Following this procedure, you should end up with: (1) a set of work- 
load weights from each subject for each group of similar tasks, and (2) at 
least one rating sheet for each subject for each experimental task. Typi- 
cally, we have run within-subject experiments and therefore ended up with a 
larger number of rating sheets for each subject. 

To conserve paper and speed up the subsequent analysis, we often 
enclose the Rating Sheet and the Sources-of-Workload comparison cards in 
clear plastic. Subjects mark the scales with an erasable felt tip marker. 
Immediately after they are marked, the experimenter transfers the responses 
onto the appropriate form or worksheet. Then the plastic sheets are cleaned 
and reused. If this procedure is followed, DOUBLE CHECK YOURSELF 
BEFORE ERASING THE SUBJECT S RESPONSES f 

*. DATA ANALYSIS PROCEDURE 

The procedure for computing a weighted workload score follows: 
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1.1. Tally Sheet 

For each subject, the "Sources-of-Workload Tally Sheet" 
(Appendix D) is used to compute the weight for each factor 
The scorer simply leafs through the evaluation cards and puts a 
mark on the appropriate row of the tally column for each 
response of the subject (e g . each time the subject circled 
"Mental Demand" on a comparison card, the experimenter would 
put a mark in the "Mental Demand" row of the tally column) 
After going through the Sources-of-Workload evaluation, the 
experimenter adds the tallies for each scale and writes the totals 
in the Fr Weight" column 

1.2. Worksheet 

The Weight column from the tally sheet is then transferred to 
the "Weighted Rating Worksheet" (Appendix E). Each subject 
would have his or her individual workload parameters count 
placed on a separate worksheet for the appropriate task or set of 
similar tasks. If subjects rated more than one task, the 
appropriate number of copies of the worksheet should be made. 
Ratings are placed in the fl Raw Rating" column of the worksheet. 
The "Adjusted Rating" is formed by multiplying the Raw Rating 
by the Sources-of-Workload Weight. The adjusted ratings are 
summed across the different scales. The sum is divided by 15 
to obtain the overall weighted workload score for the subject in 
that one task condition. 

The weighted ratings are then used as a dependent measure in what- 
ever type of analyses the experimenter chooses 

Figure 1 depicts the composition of a weighted workload score graphi- 
cally. The bar graph on the left represents six subscale ratings. The width 
of the subscale bars reflects the importance of each factor (its weight), and 
the height represents the magnitude of each factor (its rating) in a particular 
task The weighted workload score (the bar on the right) represents the 
average area of the subscale bars. 

1.3. Summary 

The above procedure, although simple, can be laborious for a large 
experiment. Thus it is highly advantageous to computerize the procedure. 
A set of programs that run on IBM-PC compatible machines has been 
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OVERALL WORKLOAD (OW) = 
MEAN OF WEIGHTED RATINGS 



1“ OVERALL 
L WORKLOAD 


Figure 1: Graphic example of the composition of a 
weighted workload score 

written to gather ratings and weights, and compute the weighted workload 
scores. These are available upon request from NASA Ames Research 

Center However, if this is not a viable option, all the necessary materials 
are included in this booklet If you have any questions, comments, or 
suggestions please do not hesitate to contact us. This procedure is still 
under evaluation and we are always looking for new ideas. 
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6. SUBJECT INSTRUCTIONS: RATING SCALES 

We are not only interested in assessing your performance but also the 
experiences you had during the different task conditions. Right now we are 
going to describe the technique that will be used to examine your experi- 
ences. In the most general sense we are examining the "workload 11 you 
experienced. Workload is a difficult concept to define precisely, but a simple 
one to understand generally The factors that influence your experience of 
workload may come from the task itself, your feelings about your own per- 
formance. how much effort you put in. or the stress and frustration you felt. 
The workload contributed by different task elements may change as you get 
more familiar with a task, perform easier or harder versions of it. or move 
from one task to another. Physical components of workload are relatively 
easy to conceptualize and evaluate However, the mental components of 
workload may be more difficult to measure 

Since workload is something that is experienced individually by each 
person, there are no effective "rulers" that can be used to estimate the work- 
load of different activities. One way to find out about workload is to ask 
people to describe the feelings they experienced. Because workload may be 
caused by many different factors, we would like you to evaluate several of 
them individually rather than lumping them into a single global evaluation of 
overall workload This set of six rating scales was developed for you to use 
in evaluating your experiences during different tasks. Please read the 
descriptions of the scales carefully. If you have a question about any of the 
scales in the table please ask me about it It is extremely important that 
they be clear to you You may keep the descriptions with you for reference 
during the experiment 

After performing each of the tasks, you will be given a sheet of rating 
scales. You will evaluate the task by putting an "X"on each of the six scales 
at the point which matches your experience. Each line has two endpoint 
descriptors that describe the scale. Note that 'town performance" goes from 
"good" on the left to 'bad" on the right This order has been confusing for 
some people Please consider your responses carefully in distinguishing 
among the different task conditions Consider each scale individually. Your 
ratings will play an important role in the evaluation being conducted, thus, 
your active participation is essential to the success of this experiment and is 
greatly appreciated by all of us 
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7. SUBJECT INSTRUCTIONS: SOURCES-OF-WORKLOAD 

EVALUATION 

Throughout this experiment the rating scales are used to assess your 
experiences in the different task conditions. Scales of this sort are extremely 
useful, but their utility suffers from the tendency people have to interpret 
them in individual ways For example, some people feel that mental or tem- 
poral demands are the essential aspects of workload regardless of the effort 
they expended on a given task or the level of performance they achieved. 
Others feel that if they performed well the workload must have been low and 
if they performed badly it must have been high. Yet others feel that effort or 
feelings of frustration are the most important factors in workload; and so 
on. The results of previous studies have already found every conceivable 
pattern of values In addition, the factors that create levels of workload 
differ depending on the task. For example, some tasks might be difficult 
because they must be completed very quickly. Others may seem easy or 
hard because of the intensity of mental or physical effort required. Yet oth- 
ers feel difficult because they cannot be performed well, no matter how 
much effort is expended. 

The evaluation you are about to perform is a technique that has been 
developed by NASA to assess the relative importance of six factors in deter- 
mining how much workload you experienced. The procedure is simple: You 
will be presented with a series of pairs of rating scale titles (for example. 
Effort vs. Mental Demands) and asked to choose which of the items was 
more important to your experience of workload in the task(s) that you just 
performed. Each pair of scale titles will appear on a separate card 

Circle the Scale Ti tle th at repre sents the more imp or tant contributor 

to workload for th e spec ific task( s) you pe rformed in this experiment. 

After you have finished the entire series we will be able to use the pat- 
tern of your choices to create a weighted combination of the ratings from 
that task into a summary workload score. Please consider your choices care- 
fully and make them consistent with how you used the rating scales during 
the particular task you were asked to evaluate Don’t think that there is any 
correct pattern; we are only interested in your opinions. 

If you have any questions, please ask them now. Otherwise, start 
whenever you are ready. Thank you for your participation. 


12 



Appendix A. 



RATING SCALE DEFINITIONS 

Title 

Endpoints 

Descriptions 

MENTAL 

DEMAND 

Low/High 

How much mental and perceptual 
activity was required (eg,, thinking, 
deciding, calculating, remembering, 
looking, searching, etc )? Was the 
task easy or demanding, simple or 
complex, exacting or forgiving 7 

PHYSICAL 

DEMAND 

Low /High 

How much physical activity was 
required (eg., pushing, pulling, turn- 
ing controlling, activating, etc)? 
Was the task easy or demanding, 
slow or brisk, slack or strenuous, 
restful or laborious? 

TEMPORAL 

DEMAND 

Low/ High 

How much time pressure did you feel 
due to the rate or pace at which the 
tasks or task elements occurred? Was 
the pace slow and leisurely or rapid 
and frantic? 

PERFORMANCE 

good/poor 

How successful do you think you were 
in accomplishing the goals of the task 
set by the experimenter (or yourself)? 
How satisfied were you with your per* 
formance in accomplishing these 
goals? 

EFFORT 

Low/High 

How hard did you have to work (men 
tally and physically) to accomplish 
your level of performance? 

FRUSTRATION 

LEVEL 

Low /High 

How insecure, discouraged, irritated, 
stressed and annoyed versus secure, 
gratified, content, relaxed and compla- 
cent did you feel during the task? 
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Appendix B. 


Sources-of-Workload Comparison Cards 
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Effort 

• 

• 

• 

• 

• 

Temporal Demand 

or 

• 

• 

or 

Performance 

* 

• 

* 

• 

• 

Frustration 

Temporal Demand 

* 

• 

■ 

• 

• 

Physical Demand 

or 

m 

• 

or 

Effort 

• 

• 

■ 

a 

• 

Frustration 

Performance 

a 

a 

a 

• 

a 

Physical Demand 

or 

a 

a 

or 

Frustration 

• 

a 

a 

a 

a 

Temporal Demand 

Physical Demand 

a 

a 

a 

a 

• 

Temporal Demand 

or 

a 

a 

or 

Performance 

a 

15 

Mental Demand 



Frustration 

Performance 

or 

or 

Effort 

Mental Demand 

Performance 

Mental Demand 

or 

or 

Temporal Demand 

Effort 

Mental Demand 

Effort 

or 

or 

Physical Demand 

Physical Demand 

Frustration 


or 



Mental Demand 
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Subject ID: 


Appendix C 
Task ID: 


RATING SHEET 


MENTAL DEMAND 

1-1 I I I 


Low 

PHYSICAL DEMAND 


1-1 I I I 


High 


1 1 1 1 1 1 1 

1 , 

III 

1 i 

I i 

1 1 

1 l 1 

Low 






High 

TEMPORAL DEMAND 
1 1 1 1 1 1 1 1 

1 , 

1, 

| | 

! i 

1 i 

1 i 1 

Low 






High 

PERFORMANCE 







1 1 1 1 1 1 1 1 

1 

i 1 

1 1 

1 i 1 

i 1 

1 i 1 

Good 






Poor 

EFFORT 







1 1 1 1 1 1 1 1 1 

1 1 

. 1 

i 1 

1 1 

1 1 

| | 

Low 






High 

FRUSTRATION 







1 1 1 1 1 1 1 » 1 

1 1 

i 1 

I | 

, i 


, | 

Low 






High 
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Appendix D. 


Subject ID: 


Date: 


SOURCES-OF- WORKLOAD TALL Y SHEET 

Scale Title 

Tally 

Weight 

MENTAL DEMAND 



PHVSICAL DEMAND 



TEMPORAL DEMAND 



PERFORMANCE 



EFFORT 



FRUSTRATION 



1 

Total count = . 


(NOTE - The total count is included as a check. If 
the total count is not equal to 15, then something has 
been miscounted. Also, no weight can have a value 
greater than 5.) 
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Appendix E. 


Subject ID: 


Task ID: 


WEIGHTED RATING WORKSHEET 

Scale Title 

Weight 

1 

1 

Adjusted Rating 
(Weight X Raw) 

MENTAL DEMAND 




PHYSICAL DEMAND 




TEMPORAL DEMAND 




PERFORMANCE 




EFFORT 




FRUSTRATION 





Sum of "Adjusted Rating" Column = 


WEIGHTED RATING = 

(i.e.. (Sum of Adjusted Ratings)/15| 
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